Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures
نویسندگان
چکیده
We propose an easy-to-understand, visual performance model that offers insights to programmers and architects on improving parallel software and hardware for floating point computations.
منابع مشابه
Modeling the Performance of Geometric Multigrid on Many-core Computer Architectures
The basic building blocks of the classic geometric multigrid algorithm are all essentially stencil computations and have a low ratio of executed floating point operations per byte fetched from memory. On modern computer architectures, such computational kernels are typically bounded by memory traffic and achieve only a small percentage of the theoretical peak floating point performance of the u...
متن کاملExploring performance and power properties of modern multicore chips via simple machine models
Modern multicore chips show complex behavior with respect to performance and power. Starting with the Intel Sandy Bridge processor, it has become possible to directly measure the power dissipation of a CPU chip and correlate this data with the performance properties of the running code. Going beyond a simple bottleneck analysis, we employ the recently published Execution-Cache-Memory (ECM) mode...
متن کاملApplying the Roofline Performance Model to the Intel Xeon Phi Knights Landing Processor
The Roofline Performance Model is a visually intuitive method used to bound the sustained peak floating-point performance of any given arithmetic kernel on any given processor architecture. In the Roofline, performance is nominally measured in floating-point operations per second as a function of arithmetic intensity (operations per byte of data). In this study we determine the Roofline for the...
متن کاملRoofline Model Toolkit: A Practical Tool for Architectural and Program Analysis
We present preliminary results of the Roofline Toolkit for multicore, manycore, and accelerated architectures. This paper focuses on the processor architecture characterization engine, a collection of portable instrumented micro benchmarks implemented with Message Passing Interface (MPI), and OpenMP used to express thread-level parallelism. These benchmarks are specialized to quantify the behav...
متن کاملA single precision preconditioner for Krylov subspace iterative methods on the CELL processor
The calculation techniques using the Cell processor have attracted attention for high performance computing. It is a heterogeneous multicore chip that is significantly different from conventional multi-processor or multicore architectures. The Cell processor provides extremely high performance single-precision floating operations, however the majority of scientific applications require results ...
متن کامل